A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Sadreddini, M. H.
- Combining Different Seed Dictionaries to Extract Lexicon from Comparable Corpus
Authors
1 Department of Informatica, Universita di Pisa, Pisa, IT
2 Department of Computer Science and Engineering, Shiraz University, Shiraz, IR
3 Department of Computer Science and Engineering, Shiraz University, IR
4 Department of Electronics, Informatics and Systems, University of Calabria, Rende, IT
Source
Indian Journal of Science and Technology, Vol 7, No 9 (2014), Pagination: 1279-1288Abstract
In recent years, many studies on extracting new bilingual lexicons from non-parallel (comparable) corpora have been proposed. Nearly all apply an existing small dictionary or other resource to make an initial list named seed dictionary. In this paper we discuss on using different types of dictionaries and their combinations as the initial starting list to produce a bilingual Persian-Italian lexicon from a comparable corpus. Our experiments applied state of the art techniques on four different seed dictionaries; an existing dictionary and three dictionaries created with pivot-based schema considering three different languages as pivot. We have used English, Arabic and French as pivot languages to extract these three pivot based dictionaries. An interesting challenge in our approach is proposing a method to combine different dictionaries together producing a better and more accurate lexicon. In order to combine seed dictionaries we proposed two novel combination models and examine the effect of them on comparable corpora which are collected from News Agencies. The experimental results exploited by our implementation show the efficiency of our proposed combinations.Keywords
Bilingual Lexicon, Comparable Corpus, Pivot Language- A Robust Instance Weighting Technique for Nearest Neighbor Classification in Noisy Environments
Authors
1 Department of Computer Engineering, Shiraz Branch, Islamic Azad University, Shiraz, IR
2 Department of Computer Science and Engineering, Shiraz University, IR
Source
Indian Journal of Science and Technology, Vol 8, No 1 (2015), Pagination: 70-78Abstract
The performance of Nearest Neighbor (NN) classifier is highly dependent on the distance (or similarity) function used to find the NN of an input test pattern. Many of the proposed algorithms try to optimize the accuracy of the NN rule using a weighted distance function. In this scheme, a weight parameter is learned for each of the training instances. The weights of training instances are used in the generalization phase to find the NN of an input test pattern. The Weighted Distance Nearest Neighbor (WDNN) algorithm attempts to maximize the leave-one-out classification rate of the training set by adjusting the weight parameters. The procedure simply leads to weights that overfit the train data, which degrades the performance of the method especially in noisy environments.
In this paper, we propose an enhanced version of WDNN, called Overfit Avoidance for WDNN (OAWDNN), that significantly outperforms the algorithm in generalization phase. The proposed method uses an early stopping approach to decrease instance weights specified by WDNN, which implicitly makes the class boundary smooth and consequently more generalized.
In order to evaluate robustness of the algorithm, class label noise is added to a variety of UCI datasets. The experimental results show the supremacy of the proposed method in generalization accuracy.